All Questions
Tagged with class-imbalanceimbalanced-learn
34 questions
0votes
1answer
40views
SVC labels entire sample majority class, even after using ADASYN
I have an imbalanced sample (850 in group X vs 100 in group Y). I am trying to predict group membership using support vector classifcation. I am using 'Adaptive Synthetic' (ADASYN) to oversample the ...
4votes
0answers
69views
How do you know that your classifier is suffering from class imbalance?
Inspired by @Dave's question "Why does data science see class imbalance as a problem for supervised learning when statistics does not?", I am re-posting a question I posed on the stats SE to ...
6votes
3answers
296views
Reproducible examples where balancing the training data demonstrably improves accuracy
I asked this question on the Statistics SE, but there were no answers, even when a modest bonus was available, so I am asking here to see if any examples can be given. I have been looking into the ...
1vote
1answer
1kviews
I used SMOTE-ENN to balance my dataset and it improved the performance metrics, but how can I be sure it's not overfitting?
The models were evaluated using 10-fold cross validation. foldCount = StratifiedKFold(10, shuffle=True, random_state=1) The models in question are XGBoost. ...
2votes
2answers
2kviews
How to calculate accuracy of an imbalanced dataset
I like to understand what is the accuracy of an imbalanced dataset. Let's suppose we have a medical dataset and we want to predict the disease among the patients. Say, in an existing dataset 95% of ...
0votes
1answer
94views
Do I need to use AUPRC for reporting classification results on an imbalanced dataset when the model was trained using upsampling and CV
I am working on a binary classification problem which dataset has about 5% of positive class samples. I split the dataset, 70% for training and 30% for testing. I used the test data only once for ...
0votes
1answer
130views
How to effectively evaluate a model with highly imbalanced and limited dataset
Most data imbalance questions on this stack have been asking How to learn a better model, but I tend to think one other problem is How do we define "better" (i.e. fairly evaluate the learned ...
1vote
1answer
431views
Class imbalance: Will transforming multi-label (aka multi-task) to multi-class problem help?
I noticed this and this questions, but my problem is more about class imbalance. So now I have, say, 1000 targets and some input samples (with some feature vectors). Each input sample can have label ...
0votes
1answer
78views
Give more weight to features based on distribution plot
I have a task to predict a binary variable purchase, their dataset is strongly imbalanced (10:100) and the models I have tried so far (mostly ensemble) fail. In ...
0votes
1answer
78views
Over-sampling when predicting a contionuous variable
Let's say I am predicting house selling prices (continuous) and therefore have multiple independent variables (numerical and categorical). Is it common practice to balance the dataset when the ...
0votes
1answer
249views
Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets
Reading the following article: https://kiwidamien.github.io/how-to-do-cross-validation-when-upsampling-data.html There is an explanation of how to use ...
0votes
1answer
2kviews
Handling Imbalanced Datasets in Orange
I work in the medical domain, so class imbalance is the rule and not the exception. While I know Python has packages for class imbalance, I don't see an option in Orange for e.g. a SMOTE widget. I ...
3votes
1answer
829views
What does IBA mean in imblearn classification report?
imblearn is a python library for handling imbalanced data. A code for generating classification report is given below. ...
2votes
1answer
3kviews
Using SMOTENC in a pipeline
I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm: Given that the N-Nearest Neighbors algorithm and Euclidian distance are ...
1vote
2answers
808views
Cross validation schema for imbalanced dataset
Based on a previous post, I understand the need to ensure that the validation folds during the CV process have the same imbalanced distribution as the original dataset when training a binary ...